Kendall’s Tau (Rank Correlation) — Measure + Hypothesis Test#

Kendall’s tau answers a concrete ordering question:

If I pick two observations at random, how often do x and y agree on which one is larger?

It’s a non-parametric measure of monotonic association (excellent for ordinal data), and it naturally supports a hypothesis test for association / independence.


Learning goals#

By the end you can:

  • explain concordant vs discordant pairs (the entire statistic is built from this)

  • compute \(\tau\) (tau-a and tau-b) from scratch with NumPy

  • run and interpret a permutation test for H0: no association

  • interpret \(\tau\) as a probability difference (not “percent correlation”)

import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio

pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")
np.set_printoptions(precision=4, suppress=True)

rng = np.random.default_rng(42)

When to use Kendall’s tau#

Use Kendall’s tau when:

  • your variables are ordinal (ranks, ratings, Likert scales) or you mostly trust the ordering

  • you expect a monotonic relationship (increasing/decreasing, not necessarily linear)

  • you want something fairly robust to outliers compared to Pearson correlation

Common alternatives:

  • Pearson: measures linear association (sensitive to outliers; assumes more structure)

  • Spearman’s rho: correlation of ranks (also monotonic; different weighting than tau)

Kendall’s tau is often the most interpretable when you want to reason in terms of pairwise ordering agreement.

1) The core idea: concordant vs discordant pairs#

Take any pair of observations \((i, j)\).

Define the pairwise differences:

  • \(\Delta x = x_i - x_j\)

  • \(\Delta y = y_i - y_j\)

Look at the signs:

  • if \(\Delta x\) and \(\Delta y\) have the same sign, the pair is concordant

  • if they have opposite signs, the pair is discordant

  • if either difference is 0, you have a tie in \(x\), \(y\), or both

A convenient encoding is the pair contribution:

\[ \operatorname{sign}(x_i - x_j)\;\operatorname{sign}(y_i - y_j) \in \{-1, 0, +1\} \]

Summing those contributions over all pairs gives Kendall’s S statistic:

\[ S = \sum_{i<j} \operatorname{sign}(x_i - x_j)\;\operatorname{sign}(y_i - y_j) \]

From \(S\) we get tau-a (no tie correction):

\[ \tau_a = \frac{S}{\binom{n}{2}} \]

And tau-b (tie-corrected; usually preferred for ordinal/discrete data):

\[ \tau_b = \frac{S}{\sqrt{\left(\binom{n}{2} - n_1\right)\left(\binom{n}{2} - n_2\right)}} \]

where:

  • \(n_1\) is the number of pairs tied in \(x\)

  • \(n_2\) is the number of pairs tied in \(y\)

Under independence, \(S\) (and therefore \(\tau\)) is centered around 0.

def _clean_xy(x, y):
    """Return 1D arrays with rows containing NaNs removed."""

    x = np.asarray(x)
    y = np.asarray(y)

    if x.shape != y.shape:
        raise ValueError(f"x and y must have the same shape, got {x.shape} and {y.shape}.")

    x = np.ravel(x)
    y = np.ravel(y)

    # np.isnan works for numeric dtypes; for non-numeric inputs this will raise.
    mask = ~(np.isnan(x) | np.isnan(y))
    return x[mask], y[mask]


def kendall_pair_counts(x, y):
    """Compute concordant/discordant/tie counts for Kendall's tau.

    Returns a dict with:
    - n: number of observations
    - n_pairs: number of pairs (n choose 2)
    - C: #concordant
    - D: #discordant
    - T_x: #ties in x only
    - T_y: #ties in y only
    - T_xy: #ties in both x and y
    - S: C - D

    This is an O(n^2) reference implementation meant for learning.
    """

    x, y = _clean_xy(x, y)
    n = x.size

    if n < 2:
        return dict(n=int(n), n_pairs=0, C=0, D=0, T_x=0, T_y=0, T_xy=0, S=0)

    i, j = np.triu_indices(n, k=1)
    dx = x[i] - x[j]
    dy = y[i] - y[j]

    sx = np.sign(dx)
    sy = np.sign(dy)

    prod = sx * sy

    C = int(np.sum(prod > 0))
    D = int(np.sum(prod < 0))

    T_x = int(np.sum((sx == 0) & (sy != 0)))
    T_y = int(np.sum((sy == 0) & (sx != 0)))
    T_xy = int(np.sum((sx == 0) & (sy == 0)))

    S = C - D

    return dict(
        n=int(n),
        n_pairs=int(i.size),
        C=C,
        D=D,
        T_x=T_x,
        T_y=T_y,
        T_xy=T_xy,
        S=int(S),
    )


def kendall_tau_a(x, y):
    """Kendall's tau-a (no tie correction)."""

    counts = kendall_pair_counts(x, y)
    n_pairs = counts["n_pairs"]
    if n_pairs == 0:
        return np.nan, counts

    tau = counts["S"] / n_pairs
    return float(tau), counts


def kendall_tau_b(x, y):
    """Kendall's tau-b (tie-corrected)."""

    counts = kendall_pair_counts(x, y)

    C = counts["C"]
    D = counts["D"]
    T_x = counts["T_x"]
    T_y = counts["T_y"]

    denom = np.sqrt((C + D + T_x) * (C + D + T_y))
    tau = counts["S"] / denom if denom != 0 else np.nan

    counts = {**counts, "denom": float(denom)}
    return (float(tau) if np.isfinite(tau) else np.nan), counts

2) A tiny example you can see#

We’ll use a small dataset so we can reason about pairs directly.

  • the scatter plot shows the data

  • the bar chart shows how many pairs are concordant vs discordant vs tied

x = np.array([1, 2, 3, 4, 5])
y = np.array([1, 2, 4, 3, 5])  # one inversion (3 and 4 swap)

tau_b, counts = kendall_tau_b(x, y)

print(f"tau-b = {tau_b:.3f}")
counts
tau-b = 0.800
{'n': 5,
 'n_pairs': 10,
 'C': 9,
 'D': 1,
 'T_x': 0,
 'T_y': 0,
 'T_xy': 0,
 'S': 8,
 'denom': 10.0}
fig = go.Figure()
fig.add_trace(
    go.Scatter(
        x=x,
        y=y,
        mode="markers+text",
        text=[str(i) for i in range(len(x))],
        textposition="top center",
        marker=dict(size=10),
    )
)
fig.update_layout(
    title="Tiny example (point labels are indices)",
    xaxis_title="x",
    yaxis_title="y",
)
fig.show()

labels = ["Concordant (C)", "Discordant (D)", "Tie in x (T_x)", "Tie in y (T_y)", "Tie in both (T_xy)"]
values = [counts["C"], counts["D"], counts["T_x"], counts["T_y"], counts["T_xy"]]

fig = px.bar(
    x=labels,
    y=values,
    title="Pair types that build Kendall’s tau",
    labels={"x": "pair type", "y": "count"},
)
fig.update_layout(xaxis_tickangle=-20)
fig.show()

Interpreting the sign and magnitude#

  • sign: \(\tau > 0\) means larger x tends to come with larger y (monotone increasing); \(\tau < 0\) means the opposite.

  • magnitude: in the no-ties (continuous) case, \(\tau\) has a clean probability interpretation:

\[ \tau = P(\text{concordant}) - P(\text{discordant}) \]

So if \(\tau = 0.30\) (and there are no ties), concordance is about 30 percentage points more likely than discordance for a randomly chosen pair.

Important nuance:

  • Independence implies \(\tau = 0\), but \(\tau = 0\) does not necessarily imply independence. It means “no monotone tendency detected by this statistic”.

# Among comparable (non-tied) pairs, what fraction are concordant vs discordant?
comparable = counts["C"] + counts["D"]

p_conc = counts["C"] / comparable
p_disc = counts["D"] / comparable

print(f"Comparable pairs: {comparable} / {counts['n_pairs']} total")
print(f"P(concordant | comparable) = {p_conc:.3f}")
print(f"P(discordant | comparable) = {p_disc:.3f}")
Comparable pairs: 10 / 10 total
P(concordant | comparable) = 0.900
P(discordant | comparable) = 0.100

3) Ties (and why tau-b exists)#

With ordinal/discrete data, ties are common. Ties create a practical issue:

  • tau-a divides by the total number of pairs \(\binom{n}{2}\), even though many pairs might be “uninformative” because of ties

  • as a result, even a perfectly monotone relationship with many ties can have \(|\tau_a| < 1\)

Tau-b fixes this by rescaling based on how many pairs are actually comparable in \(x\) and in \(y\).

# An ordinal-ish example with ties
x_tie = np.array([1, 1, 2, 2, 3, 3])
y_tie = np.array([1, 1, 2, 3, 3, 3])

tau_a, counts_a = kendall_tau_a(x_tie, y_tie)
tau_b, counts_b = kendall_tau_b(x_tie, y_tie)

print(f"tau-a = {tau_a:.3f}")
print(f"tau-b = {tau_b:.3f}")
counts_b
tau-a = 0.667
tau-b = 0.870
{'n': 6,
 'n_pairs': 15,
 'C': 10,
 'D': 0,
 'T_x': 1,
 'T_y': 2,
 'T_xy': 2,
 'S': 10,
 'denom': 11.489125293076057}
# Visualize ties with a little jitter so points don't sit exactly on top of each other
jitter = 0.06
xj = x_tie + rng.normal(0, jitter, size=x_tie.size)
yj = y_tie + rng.normal(0, jitter, size=y_tie.size)

fig = px.scatter(
    x=xj,
    y=yj,
    title="Ordinal data with ties (visualized with small jitter)",
    labels={"x": "x (jittered)", "y": "y (jittered)"},
)
fig.add_annotation(
    x=0.02,
    y=0.98,
    xref="paper",
    yref="paper",
    showarrow=False,
    align="left",
    text=f"tau-a = {tau_a:.3f}<br>tau-b = {tau_b:.3f}",
)
fig.show()

4) Visual intuition: the concordance matrix#

For a small dataset, you can visualize every pair’s contribution.

We build a matrix:

\[ M_{ij} = \operatorname{sign}(x_i - x_j)\;\operatorname{sign}(y_i - y_j) \]
  • +1 (red) means the pair is concordant

  • -1 (blue) means the pair is discordant

  • 0 (white) means a tie in x or y (or the diagonal)

This is literally what \(S\) sums over (for \(i < j\)).

n_small = 12
x_small = np.arange(n_small)
# Mostly increasing, but with noise so we get some discordant pairs
y_small = x_small + rng.normal(0, 2.0, size=n_small)

# Build the full matrix for visualization (O(n^2) but tiny here)
dx = x_small[:, None] - x_small[None, :]
dy = y_small[:, None] - y_small[None, :]
M = np.sign(dx) * np.sign(dy)
np.fill_diagonal(M, 0)

fig = px.imshow(
    M,
    zmin=-1,
    zmax=1,
    color_continuous_scale="RdBu",
    title="Concordance matrix M (red=concordant, blue=discordant)",
    labels=dict(x="j", y="i", color="sign"),
)
fig.update_layout(coloraxis_colorbar=dict(tickvals=[-1, 0, 1]))
fig.show()

# Check that summing the upper triangle matches S
_, counts_small = kendall_tau_a(x_small, y_small)
S_from_matrix = int(np.sum(np.triu(M, k=1)))
print("S (from counts):", counts_small["S"])
print("S (from matrix):", S_from_matrix)
S (from counts): 48
S (from matrix): 48

5) Tau cares about order, not the scale#

Because tau is built from comparisons (x_i > x_j?), it is invariant to strictly monotone transformations.

Example: if you replace \(x\) with \(\exp(x)\) (strictly increasing), the ordering doesn’t change — and tau doesn’t change.

This is a big reason tau is popular for:

  • ordinal scales

  • heavy-tailed data

  • relationships that are monotonic but not linear

# Nonlinear but monotonic relationship
n = 80
x_nl = rng.normal(size=n)
y_nl = x_nl**3 + rng.normal(0, 1.5, size=n)

tau_raw, _ = kendall_tau_b(x_nl, y_nl)
tau_exp, _ = kendall_tau_b(np.exp(x_nl), y_nl)

pearson = np.corrcoef(x_nl, y_nl)[0, 1]

print(f"Kendall tau-b (x, y)      = {tau_raw:.3f}")
print(f"Kendall tau-b (exp(x), y) = {tau_exp:.3f}")
print(f"Pearson corr (x, y)       = {pearson:.3f}")

fig = px.scatter(
    x=x_nl,
    y=y_nl,
    title="Monotonic but nonlinear relationship (y = x^3 + noise)",
    labels={"x": "x", "y": "y"},
)
fig.add_annotation(
    x=0.02,
    y=0.98,
    xref="paper",
    yref="paper",
    showarrow=False,
    align="left",
    text=f"Kendall tau-b = {tau_raw:.3f}<br>Pearson r = {pearson:.3f}",
)
fig.show()
Kendall tau-b (x, y)      = 0.309
Kendall tau-b (exp(x), y) = 0.309
Pearson corr (x, y)       = 0.589

6) Hypothesis testing: is the association more than chance?#

A common hypothesis test is:

  • H0: \(X\) and \(Y\) are independent (no association)
    (under H0 the expected tau is 0)

  • H1: there is an association (two-sided), or specifically increasing/decreasing (one-sided)

Interpreting the test#

  • A small p-value means the observed ordering agreement (tau) is unlikely under H0.

  • Always report tau itself as the effect size.

Two practical reminders:

  1. With large samples, even tiny tau values can be “statistically significant”.

  2. With many ties (discrete data), prefer tau-b and/or a permutation test.

7) (Optional) Large-sample normal approximation (no ties)#

For the continuous case (no ties), Kendall’s \(S\) has an asymptotic normal distribution under H0.

For sample size \(n\) (no ties):

\[ \operatorname{Var}(S) = \frac{n(n-1)(2n+5)}{18} \]

So a z-score is:

\[ Z = \frac{S}{\sqrt{\operatorname{Var}(S)}} \]

This is fast, but:

  • it’s only accurate-ish for larger \(n\)

  • tie corrections make the variance formula more complicated

  • permutation tests are usually easier to trust when learning

import math


def _normal_cdf(z):
    return 0.5 * (1.0 + math.erf(z / math.sqrt(2.0)))


def kendall_tau_a_asymptotic_test(x, y, *, alternative="two-sided"):
    """Asymptotic z-test using S variance formula (no ties)."""

    _, counts = kendall_tau_a(x, y)
    n = counts["n"]
    S = counts["S"]

    if n < 2:
        return np.nan, np.nan

    var_s = n * (n - 1) * (2 * n + 5) / 18
    z = S / math.sqrt(var_s)

    alternative = alternative.lower()
    if alternative == "two-sided":
        p = 2 * (1 - _normal_cdf(abs(z)))
    elif alternative == "greater":
        p = 1 - _normal_cdf(z)
    elif alternative == "less":
        p = _normal_cdf(z)
    else:
        raise ValueError("alternative must be one of: 'two-sided', 'greater', 'less'.")

    return float(z), float(p)


# Compare (roughly) to permutation on the same example when there are effectively no ties
z, p_asym = kendall_tau_a_asymptotic_test(x_ex, y_ex)
print(f"Asymptotic z (tau-a) = {z:.3f}")
print(f"Asymptotic p-value   = {p_asym:.4f}")
Asymptotic z (tau-a) = 7.654
Asymptotic p-value   = 0.0000

8) Bootstrap confidence interval (effect size uncertainty)#

A p-value answers “is it plausible tau is 0?”, but you often also want:

  • an uncertainty interval for tau itself

A simple approach is a bootstrap:

  1. resample the dataset with replacement

  2. recompute tau for each bootstrap sample

  3. take percentiles for a CI

(Like the permutation test, this is straightforward to implement and visualize.)

def bootstrap_tau_b(x, y, *, n_boot=2000, ci=0.95, rng=None):
    if rng is None:
        rng = np.random.default_rng()

    x, y = _clean_xy(x, y)
    n = x.size

    tau_samples = np.empty(n_boot, dtype=float)
    for b in range(n_boot):
        idx = rng.integers(0, n, size=n)
        tau_samples[b], _ = kendall_tau_b(x[idx], y[idx])

    alpha = 1 - ci
    lo = np.quantile(tau_samples, alpha / 2)
    hi = np.quantile(tau_samples, 1 - alpha / 2)
    return tau_samples, float(lo), float(hi)


tau_boot, lo, hi = bootstrap_tau_b(x_ex, y_ex, n_boot=3000, rng=rng)
print(f"Bootstrap 95% CI for tau-b: [{lo:.3f}, {hi:.3f}]")

fig = px.histogram(
    tau_boot,
    nbins=60,
    title="Bootstrap distribution of Kendall tau-b",
    labels={"value": "tau-b (bootstrap)"},
)
fig.add_vline(x=lo, line_color="black", line_dash="dot", annotation_text="CI low")
fig.add_vline(x=hi, line_color="black", line_dash="dot", annotation_text="CI high")
fig.add_vline(x=tau_obs, line_color="crimson", annotation_text="observed")
fig.show()
Bootstrap 95% CI for tau-b: [0.577, 0.765]

9) Diagnostics and pitfalls#

  • Independence of observations matters. If you have time series or repeated measures, tau’s usual p-values can be misleading.

  • Ties are common in ordinal data → prefer tau-b.

  • Effect size vs significance: don’t stop at “p < 0.05”. Report tau (and ideally a CI).

  • Complexity: this reference implementation is \(O(n^2)\). For large datasets, use an optimized library implementation.

Exercises#

  1. Create a dataset where Pearson correlation is near 0 but Kendall tau is clearly non-zero (hint: monotone but nonlinear).

  2. Modify kendall_permutation_test to support a one-sided alternative and verify it behaves as expected.

  3. Stress-test the \(O(n^2)\) implementation with increasing n and plot runtime.

References#

  • Kendall, M. (1938). A New Measure of Rank Correlation.

  • SciPy: scipy.stats.kendalltau (for a production-ready implementation and additional details).